Philosophical Transactions of the Royal Society B: Biological Sciences — Latest Matching Preprints

1

Keeping human in the loop: A three-phase generative AI workflow for research integrity in data-intensive science.A methodological case study using elite Ethiopian distance-running data

Galko, P.; Yisamaw, A.; Haugen, T.; Seiler, S.

2026-05-29 sports medicine 10.64898/2026.05.29.26354013 medRxiv

Top 3%

0.4%

Show abstract

Background: Generative AI tools can support data-intensive research by writing code, drafting prose, searching analytical possibilities, and stress-testing claims. They can also produce false citations, drift between statistical specifications, and lose continuity across long investigations. This paper describes a practical workflow for using AI systems in empirical research while keeping discovery, verification, and accountability inspectable. Methods: We developed and applied a three-phase human-AI workflow to a case study of 14 elite Ethiopian distance runners. The dataset contained 22,605 GPS-segments collected across 97 consecutive days in late 2025, supplemented by venue and athlete metadata collected in the field. Phase 1 used an autonomous data-exploration tool to pre-filter the hypothesis space across five seeded research questions. Phase 2 used an AI system under direct human guidance to construct candidate findings into numerical claims, verification scripts, and draft text. Phase 3 used an independent AI system in an adversarial role to stress-test methods, statistics, prose, figures, and citations. The workflow was informed by Pearl's distinction between association, intervention, and counterfactual reasoning, with human judgement retained for research direction, interpretation, and final claims. Results: The workflow produced three empirical analyses and a documented correction process. The analyses estimated an altitude-to-sea-level pace correction of +0.10 min/km per 1,000 m at matched heart rate, showed why pooled altitude-surface regression was not identifiable within this venue system, documented method-dependence in heart-rate-based intensity classification, characterised within-venue route variation as a 64/36 path-fixed-to-trail-variable split with the Sululta label resolving into two functionally distinct sub-venues, and reframed the cohort's training through a 3x3x3 prescription lattice grounded in Ethiopian coaching practice. The adversarial phase identified several hallucinated citations, a terminology error between HC1 and cluster-robust standard errors, and several inconsistencies between prose, figures, and computed results. Verification scripts re-derived nearly all numerical claims from the cleaned lap-level data. Conclusions: The case study shows how researchers can organise AI-assisted empirical work so that candidate discovery, claim construction, independent stress-testing, and final accountability remain separated. The workflow did not remove the need for domain expertise or human judgement. Its value was in making the route from candidate finding to manuscript claim explicit, reproducible, and open to challenge. Trial registration: Not applicable.

2

Spatial variation in incidence of meningococcal meningitis: evidence from a large historical epidemic in Glasgow

Stewart, G.; Schroeder, M.; Mancy, R.; Angelopoulos, K.

2026-05-30 epidemiology 10.64898/2026.05.28.26354324 medRxiv

Top 3%

0.4%

Show abstract

Large epidemics of invasive meningococcal disease are rare in temperate regions. Here, we analyse administrative data on the largely forgotten epidemic of bacterial meningococcal meningitis that occurred in Glasgow in 1907, probably the largest on record in the UK. The epidemic, predominantly confined to the city, killed around 1,000 people, had a case fatality rate of nearly 70%, and hit infants and young children the hardest. We show the rapid rise and fall in cases and the spatial distribution of incidence and mortality rates within the city. We find that within-household overcrowding was a key driver of incidence whereas between-household geographic proximity was not. We also find that the spatial distribution of disease risk during the epidemic persisted in the post-epidemic period and during a later outbreak. The findings suggest that interventions should prioritise populations in areas that have experienced higher incidence rates to mitigate the risk of future outbreaks.

3

Intention of UK residents to wear facemasks and practise social distancing during the next respiratory virus pandemic

Smith, D. R.; Buckell, J.; Hancock, T. O.; Morrell, L.; Pouwels, K.

2026-05-30 public and global health 10.64898/2026.05.21.26353824 medRxiv

Top 3%

0.3%

Show abstract

Background: Wearing facemasks and practising social distancing slow the spread of respiratory pathogens. However, in the event of a new pandemic emerging, the willingness of populations to voluntarily adopt these behaviours is unclear. Methods: A discrete choice experiment was conducted among 2,006 UK-based adults. Participants were presented with hypothetical scenarios describing the emergence of a respiratory virus pandemic and were asked to choose when they would wear facemasks and practise social distancing. A mixed multinomial logit model was used to jointly estimate how disease severity and prevalence, uncertainty in these quantities, and individual-level characteristics influence behavioural choices. Findings: Participants were averse to facemasks and social distancing in the absence of pandemic risk. For each ten-unit increase in severity (10 additional hospitalisations/1,000 infections), the odds of always wearing a facemask outside the home increased by 15.9% (95%CI: 14.3%, 17.5%), relative to rarely/never, and the odds of avoiding all people as much as possible increased by 16.4% (14.6%, 18.2%), relative to not avoiding anyone. Greater disease prevalence, uncertainty in disease severity or disease prevalence, a university education, prior COVID-19 vaccination and non-white ethnicity were also associated with choosing to always wear facemasks and avoid all people as much as possible. The probability of participants choosing to rarely/never wear facemasks varied from 13.4% (11.9%, 14.9%) in the lowest-risk scenario to 1.4% (1.2%, 1.7%) in the highest-risk scenario. Interpretation: Perceived risks of disease and associated uncertainty drive intention of UK adults to adapt their behaviour in a future pandemic.

4

Using Bayesian Evidence Synthesis to estimate the number of sex workers in the United Kingdom

Long, H.; Gada, L.; Murray, L.; Laurence, T.; Hayward, A.; Finnie, T.

2026-05-26 public and global health 10.64898/2026.05.21.26353767 medRxiv

Top 3%

0.3%

Show abstract

Sex work is diverse and includes a broad range of people and settings. Over the last thirty years, a large proportion of public health emergencies of international concern (PHEIC) have involved infections transmitted through sexual or close contact and in sexual networks (WHO 2024). Sex workers can face increased disadvantage in relation to these public health emergencies. Given the significant health inequalities sex workers can face, they should be eligible to receive targeted and tailored health support to reduce health protection risks (Hester 2019; Jeal and Salisbury 2004a). However, they are often not explicitly eligible for targeted and tailored support due to a lack of information on incidence, prevalence of disease, and even more basic data such as reliable estimates of the number of sex workers in the UK. Accordingly, the aim of this paper is to determine a population size estimate, with uncertainty, that is more robust than those currently available. In this study, we apply Bayesian Evidence Synthesis to bring together historic estimation efforts with recent ONS National Population Estimates and Genito-Urinary Medicine Clinics Attendance Data (GUMCAD) from the UK Health Security Agency (UKHSA). A key feature of our model is the embedding of uncertainty from each input study in model priors, hence propagating it through to our final estimate. The Bayesian evidence synthesis model estimated a total of 84,000 sex workers in the United Kingdom (95% credible interval: 49,000-130,000), representing 0.121% of the current UK population.

5

Comparison of Mechanical Tissue Properties Using MyotonPRO and Time-Harmonic Elastography: Understanding Fundamental Differences and Statistical Relationships

Kurz, E.; Valli, G.; Meyer, T.; Proger, S.; Schwesig, R.; Bartels, T.; Delank, K.-S.; Sack, I.; Aghamiry, H. S.

2026-05-28 sports medicine 10.64898/2026.05.20.26353658 medRxiv

Top 4%

0.3%

Show abstract

Abstract Purpose: MyotonPRO (MTP) and time-harmonic elastography (THE) are increasingly used to assess muscle mechanical properties, yet they operate on fundamentally different physical principles. MTP measures composite MTP stiffness (N/m) through surface oscillations, while THE quantifies intrinsic shear modulus (THE stiffness, kPa) via propagating shear waves. This study aimed at systematically compare MTP and THE measurements in the vastus lateralis muscle across different contraction intensities and examine how the skin layer and subcutaneous fat (SLSF) thickness influence their relationship. Methods: Twenty-six healthy adults (15 males, 11 females; age 25 [SD 4] years) underwent MTP and THE measurements of the vastus lateralis at rest and during isometric contractions at 15% and 30% maximal voluntary contraction (MVC). Effects of contraction intensities on tissue properties were assessed using univariate analyses of variance with repeated measures. Associations between the different outcomes of THE and MTP technologies were explored using Pearson's correlations and partial correlation coefficients separately for each contraction intensity with adjustment of the SLSF thickness of participants. Results: Both technologies detected contraction intensity-dependent stiffening across all outcomes (p < 0.001). THE stiffness increased from 5.3 [1.2] kPa at rest to 15.6 [6.1] kPa at 30% MVC; THE wave attenuation increased from 0.83 [0.19] to 1.42 [0.36] s/m while MTP stiffness increased from 337.3 [49.3] N/m at rest to 529.4 [160.7] N/m at 30% MVC. Correlations between modalities were weak and condition-dependent. THE wave attenuation did not significantly correlate with any MTP outcome across conditions. Conclusion: MTP and THE detect contraction-induced stiffening through fundamentally different physical mechanisms and should not be regarded as interchangeable. Their correlation is modest at rest and breaks down (or reverses) during active contraction, with subcutaneous fat as a key modifying factor. Clinical trial number: Not applicable.

6

Inferring Sexual Network Bridging Using Genomics: A Simulation Study

Kline, M. C.; Helekal, D.; Oliveira Roster, K. I.; Grad, Y.

2026-05-26 infectious diseases 10.64898/2026.05.24.26353967 medRxiv

Top 4%

0.3%

Show abstract

The dynamics of sexually transmitted infections involve interconnected transmission networks, including men who have sex with men and heterosexual populations. Understanding the extent of bridging between these networks can inform surveillance, guide interventions, and aid in the interpretation of their impact, but methods for quantifying bridging have been lacking. Here, we addressed whether pathogen genomics tools, successfully used to reconstruct transmission in other contexts, could accurately infer sexual network bridging. Based on simulations of gonorrhea spread, we evaluated phylodynamic bridging metrics inferred by ancestral state reconstruction under a range of sampling schemes, from comprehensive to sparse. These metrics differentiated sexual network structures even with biased sampling schemes, but accuracy depended on the sampling scheme and density: phylodynamic bridging estimates using sequences from all detected infections for one network configuration were on average 6.9% above the true value, whereas estimates from 5% of infections in symptomatic men with many partners were on average >1000% above the true value. These results suggest routine overestimation of bridging from unadjusted inferences from genomics data and provide context for interpreting existing genomic surveillance data and targeted studies.

7

The Telesafe archive: creating a database of UK primary care telephone consultations

Edwards, P. J.; Caddick, B.; Skeen, A.; Lin, J.; Ridd, M. J.; Barnes, R. K.; Salisbury, C.

2026-05-26 primary care research 10.64898/2026.05.19.26353559 medRxiv

Top 4%

0.3%

Show abstract

Background In 2024, one-third of GP appointments in England were conducted by telephone. What happens during these consultations is largely unknown. Aim To test the feasibility of collecting recorded GP telephone consultations with linked data and consent for future research use. Design and setting Retrospective observational study in seven practices in South West England. Method Adults who had a telephone consultation at practices that routinely record calls were invited to consent to retrieval of call audio, a 4-month electronic health record (EHR) extract and a post-consultation patient questionnaire. Practice-level consent rates were analysed using regression models. Results Of 28 clinicians recruited, 19 GPs had consultations with patients whose recordings were retrievable, usable, and consented for future research. Of 2,053 invitations, 123 patients consented (6.0%). Consent was lower in more deprived practices (IMD 1-2 vs 9-10: OR=0.22, 95CI=0.09-0.54). Of 101 recordings retrieved, 96 were usable and 91 had consent for future research. 86/91 were linked to EHRs and 89/91 to post-consultation patient questionnaires. Mean consultation duration was 7 minutes 13 seconds; audible typing was heard in 69% (63/91). 161 problems were discussed (mean 1.77 per consultation). Most patients were happy their consultation was by telephone (96/117, 82%), although the majority reported usually preferring face-to-face appointments (68/115, 59%). Conclusion It is feasible to assemble a reusable archive of GP telephone consultations with linked data. However, recruitment was low using retrospective remote consent. Future work should test alternative recruitment approaches, particularly to improve patient engagement at practices serving deprived populations.

8

Cross-Sectional Measures of Periodontal Severity: Distortion from Severity-Dependent Tooth Loss

McCormick, K. M.; Amarasena, N.; Guzzo, G.; Nath, S.; Jamieson, L.

2026-05-30 dentistry and oral medicine 10.64898/2026.05.27.26354277 medRxiv

Top 5%

0.2%

Show abstract

Aim: Cross-sectional summaries of periodontitis based on clinical attachment loss (CAL) are, by definition, conditioned on surviving teeth. Because the most severely affected teeth are more likely to have been lost, these measures may underestimate cumulative disease burden and show an artificial flattening (attenuation) of severity with age. We hypothesised that measures more sensitive to severe attachment loss would show greater attenuation at older ages than measures defined across a broader range of sites. Materials and Methods: Using nationally representative data from adults aged 30+ years in NHANES 2009-2014, we examined age-specific trajectories across multiple continuous measures of periodontal severity and assessed whether divergence between measures followed the pattern predicted under severity-dependent tooth loss. Results: The proportion of observable sites declined from 93% at ages 30-34 to 68% at 80+ years, establishing the structural basis for the divergence observed across severity measures. All severity measures showed nonlinear attenuation with age, with distortion increasing with severity threshold. Higher-threshold measures exhibited the greatest attenuation, while lower-threshold measures showed more stable trajectories. Conclusions: Cross-sectional summaries of periodontitis reflect disease among surviving teeth rather than cumulative damage across teeth originally at risk. Attenuation at older ages is consistent with depletion of the most severely affected teeth rather than biological slowing. Distortion varies by measure, with higher-threshold and mean-based indices most affected, whereas the CAL 3+ mm threshold provides a more stable basis for age comparisons.

9

Estimating Lifetime Periodontal Burden Under Informative Tooth Loss

McCormick, K. M.; Amarasena, N.; Guzzo, G.

2026-05-30 dentistry and oral medicine 10.64898/2026.05.27.26354300 medRxiv

Top 5%

0.2%

Show abstract

Background: Periodontitis is defined by cumulative, irreversible tissue destruction, yet population-based measurement typically relies on cross-sectional indicators derived from retained teeth. Destruction that occurred earlier in life, particularly disease severe enough to result in tooth loss, is structurally excluded from these measures, potentially leading to systematic underestimation of lifetime periodontal burden. Objective: To develop and evaluate a measurement framework that estimates lifetime periodontal burden from cross-sectional data by explicitly incorporating informative tooth loss under etiological uncertainty. Methods: Data were drawn from 10,324 adults aged [≥]30 years participating in the 20090-2016 National Health and Nutrition Examination Survey (NHANES) who completed full-mouth periodontal examination and glycated hemoglobin (HbA1c) testing. Lifetime periodontal burden was estimated by combining observed clinical attachment loss in retained teeth with probabilistic contributions from missing teeth, using three alternative age-stratified attribution schedules derived from epidemiological studies of periodontal extraction. Performance was compared with conventional measures of periodontal severity and extent using distributional analyses, correlations with HbA1c, discrimination of diabetes status, and relative importance analysis. Age-adjusted models were treated as sensitivity analyses. Results: Estimated lifetime periodontal burden exhibited strong, monotonic age gradients across glycemic categories, in contrast to more attenuated patterns observed for severity and extent. Across attribution schedules, lifetime burden showed stronger correlations with HbA1c ({rho} = 0.30-0.32) than conventional measures. In multivariable models including all indices, lifetime burden retained an independent association with HbA1c, whereas severity and extent contributed little unique information. Discriminative performance for diabetes status was consistently higher for lifetime burden than for conventional measures and remained stable across attribution schedules. Conclusions: Lifetime periodontal burden can be estimated from cross-sectional data by explicitly modelling informative tooth loss rather than restricting measurement to retained teeth. Incorporating historical tissue loss under uncertainty yields a more coherent representation of cumulative periodontal destruction than snapshot-based measures and provides a methodological basis for life-course-oriented periodontal epidemiology.

10

Two anti-phase spatial modes and a candidate spatial-persistence regime transition of SARS-CoV-2 in Japan: a 159-week prefecture-level sentinel surveillance study

Nakano, T.; Onozuka, D.; Ikeda, Y.; Washiyama, K.; Takashima, Y.

2026-05-26 epidemiology 10.64898/2026.05.24.26353972 medRxiv

Top 6%

0.2%

Show abstract

Background. On 8 May 2023 the Japanese Ministry of Health, Labour and Welfare reclassified COVID-19 under the Infectious Disease Control Law from a designated infectious disease (with case-by-case reporting requirements comparable to those of a Category-2 disease) to a Category-5 ("Class-5") notifiable disease, joining the same category as seasonal influenza and most other endemic respiratory infections. Under this regime, COVID-19 case counts are reported weekly from a nationwide network of sentinel medical facilities (initially approximately 5,000, reduced to approximately 3,000 following an April 2025 surveillance reform), and individual case reporting is no longer required. We aimed to characterize the spatial topology of COVID-19 epidemics under this sentinel-surveillance regime and to detect, in a data-driven manner, any structural change in epidemic dynamics over this period. Methods. We analyzed weekly per-sentinel-facility COVID-19 case counts in all 47 prefectures of Japan from 2023-W17 to 2026-W19 (159 weeks). For each week we computed the Shannon pseudo-entropy S of the prefecture-share distribution and global, local, and time-lagged Moran's I across a 92-edge contiguity-based adjacency matrix. To identify any structural change in a data-driven manner, we adopted a two-stage approach motivated by an empirical regularity established in Section 3: we first verified the wave-amplitude-invariant entropy ceiling (S_max >= 3.80 in all five pre-transition waves), then restricted change-point detection to the weeks after S(t) last attained this ceiling, applying PELT, CUSUM, and Bai-Perron sup-F within this restricted region. Seasonal structure was characterized by truncated Fourier regression with first-order autoregressive errors (Cochrane-Orcutt) over harmonic orders K = 1 to 6; between-period comparisons used moving block bootstrap as the principal inferential statistic. Results. The five epidemic waves during 2023-2025 followed a stereotyped spatial template in which S(t) traced a characteristic U-shape around each peak, with a wave-amplitude-invariant entropy ceiling reaching on average 99.4% of the theoretical maximum ln 47 (range 3.820-3.836, SD 0.006). The last week in which S(t) attained this entropy ceiling was 2025-W42. Restricting change-point detection to the 29 subsequent weeks, PELT and CUSUM localised the structural break to late 2025: PELT identified 2025-W48 (robust across penalty values >= sigma^2*ln(n) and across entropy-ceiling thresholds 3.78-3.82) and CUSUM peaked at 2025-W50 (p < 0.0001), placing the break within a two-week window centred on late November 2025. Bai-Perron sup-F peaked later at 2026-W02 (p = 0.062, with reduced power on n = 29). We adopted 2025-W48 as the principal change-point, defining 135 pre-transition weeks and 24 post-transition weeks. Two anti-phase spatial modes were identified in the pre-transition record: a summer-onset Okinawa-seeded Kyushu cascade (Mode A; annual peak epi week 26) and a winter-onset Tohoku-centred connected-cluster mode (Mode B; annual peak epi week 51), approximately 25 epi weeks out of phase. After the regime transition, this ceiling was not attained, and the spatial-persistence ratio I(tau = 8 wk)/I(0) shifted from a highly variable distribution centred near 0.27 (pre-transition, 125 weeks) to a tightly clustered distribution around 0.89 (post-transition, 24 weeks); the mean difference was 0.62 (95% bootstrap CI 0.32 to 0.90; moving block bootstrap p < 0.0001 across block lengths 1-12). The principal finding remained significant under autoregressive-augmented null models and was robust to adjacency-matrix choice, the April 2025 surveillance reform, harmonic order K = 1 to 6, and Okinawa exclusion. Conclusions. Data-driven analysis of 159 weeks of Japanese sentinel surveillance identifies a candidate spatial-persistence regime transition emerging in late November 2025, in which the spatial structure of weekly case shares persists for at least 8 weeks rather than dissipating as in pre-transition. The transition coincides with loss of the wave-amplitude-invariant entropy ceiling and with absence of the Mode A signature through the observed post-transition period. The recent uptick in Okinawa case shares (continuing through 2026-W19) leaves open whether the Mode A signature is structurally suppressed or merely deferred; observation through summer 2026 is required to distinguish a sustained shift from a transient anomaly.

11

Wilson's Central Terminal Changes Location on the Body Surface During the P-Wave: Why Precordial Leads Might Not Be What We Think

Bender, J.; Stoks, J.; Barrios Espinosa, C.; Becker, S.; Cluitmans, M. J. M.; Loewe, A.

2026-05-28 cardiovascular medicine 10.64898/2026.05.20.26352966 medRxiv

Top 8%

0.1%

Show abstract

Background and Aims: Clinical interpretation of the precordial leads V1-V6 assumes that Wilson's central terminal (WCT) has a fixed anatomical location. Consequently, a positive signal corresponds to electrical activation spreading from WCT towards the respective electrode, and vice versa. However, the location of WCT has never been systematically investigated. Yet, a better understanding of WCT location could improve the interpretation of the precordial leads. This work aims to characterize the spatial expansion and location of the physical WCT i.e., the electrical potential defined by the WCT, during the P-wave on the body surface. Methods: An intensive analysis of body surface potential maps (BSPMs) during atrial depolarization in an in silico patient cohort and clinical data was conducted. Results: During the P-wave, the location of WCT was not stationary but the spatial extent and location varied across time as well as across individuals. Four distinct spatial patterns of WCT distribution on the body surface were identified in silico, and three of these were found in the clinical cohort. WCT signals agreed with BSPM signals at commonly assumed positions of WCT only for a small fraction of the P-wave. Conclusion: The spatial extension and location of WCT changes during the P-wave and thus should be considered when interpreting the precordial leads.

12

Effects of theta burst stimulation on neural connectivity and visual perception following attention modification of own-face viewing in body dysmorphic disorder

Diaz-Fong, J. P.; Peel, H. J.; Zhang, K.; Qian, J.; Lewis, M.; Wong, W.-W.; Leuchter, A. F.; Tadayonnejad, R.; Voineskos, D.; Konstantinou, G.; Lam, E.; Blumberger, D. M.; Feusner, J. D.

2026-05-26 psychiatry and clinical psychology 10.64898/2026.05.25.26354053 medRxiv

Top 8%

0.1%

Show abstract

Background: Individuals with body dysmorphic disorder misperceive defects of their physical appearance. Current evidence suggests that visual processing abnormalities may underlie this core symptom. Separate pre-clinical studies testing perceptual and attentional interventions and non-invasive neuromodulation suggest that these visual processing abnormalities may be modifiable, but their combined effects on neural connectivity and perceptual processing remain unclear. Methods: Thirty-nine unmedicated men and women with body dysmorphic disorder or subclinical body dysmorphic disorder received intermittent theta burst stimulation and continuous theta burst stimulation targeting the lateral parietal cortex combined with a visual attention modification paradigm during functional magnetic resonance imaging, in a crossover design. Dynamic effective connectivity within dorsal and ventral visual stream pathways was calculated, and global visual processing biases were assessed using the face inversion effect before and after stimulation plus attention modification. Results: Intermittent theta burst stimulation resulted in increased connectivity in higher-level dorsal visual stream pathways during naturalistic viewing following attention modification, whereas continuous theta burst stimulation was associated with reduced connectivity in lower-level dorsal pathways and increased connectivity in ventral stream pathways. These changes were accompanied by differential effects on global visual processing, with stimulation type modulating the magnitude of the face inversion effect. Conclusions: Combined neuromodulation and visual attention modification modulate visual system connectivity and perceptual processing in individuals with body dysmorphic disorder symptoms. These findings support a mechanistic link between dorsal-ventral stream dynamics and perceptual biases. Integrating neuromodulation with perceptual retraining may represent a viable approach for targeting core symptoms of distorted appearance perception.

13

Coaching for quality improvement under performance-based contracting: a theory-of-change evaluation in Honduras

Munar, W. J.; Aranda, L. E.; Lauria, M. E.; Bernal Lara, P.; Innocenti, C.; Rodriguez, M.

2026-05-30 health systems and quality improvement 10.64898/2026.05.21.26353487 medRxiv

Top 8%

0.1%

Show abstract

Introduction. Practice coaching is increasingly used to strengthen quality improvement (QI) capacity in primary healthcare (PHC) systems in low and middle income countries (LMICs), yet the causal pathways through which it shifts provider behaviour, and the systemic conditions that enable or constrain those pathways, remain under theorised. Using a theory based qualitative evaluation, we examined how and why a practice coaching intervention influenced QI in cervical cancer screening (CCS) and antenatal care (ANC) within Honduras decentralised PHC system during the third phase of the Salud Mesoamerica Initiative (SMI). Methods. We conducted a within case explanatory case study. A programme theory was reconstructed before data collection and iteratively refined against evidence. Data comprised semi structured interviews with 11 midlevel managers, 6 PHC team medical leads, and 2 regional managers, complemented by direct observation and document review. We applied combined deductive and inductive coding, thematic analysis, and pattern matching, and reporting per COREQ. Results. We identified four causal patterns that refined the initial programme theory. Three were activated pathways: (1) novel professional identity among participating managers; (2) collective efficacy and data driven learning, sustained through verifiable progress on observable indicators, strong for CCS but null for ANC, where outcomes were less attributable to teams actions; and (3) relational coordination, psychological safety, and trust, which provided the interpersonal basis for the first two. A fourth, unanticipated pattern showed structural misalignment between coaching enabling, learning based logic and the directive, punitive logic of Honduras performance based contracting environment, confining gains to localised enabling bubbles. Conclusion. Coaching can activate meaningful QI pathways in LMIC primary care, but sustained, equitable impact requires deliberate alignment between coaching learning oriented principles and the institutional performance management architecture, and matching of coaching investment to clinical processes with observable, attributable outcomes.

14

Randomised Trial of a Multilingual Conversational AI for Preoperative Education

Ke, Y.; Niu, C.; Liao, J.; Sim, J.; Abdullah, H. R.; Jin, L.; An, J.; Ho, H. S. S.; Tung, J. Y. M.; Tan, H. K.; Sng, B. L.; Ting, D. S. W.; Ong, M. E. H.; Liu, N.

2026-05-26 anesthesia 10.64898/2026.05.24.26353997 medRxiv

Top 8%

0.1%

Show abstract

Background Informed consent depends on patients' understanding of anaesthesia risk, yet comprehension remains poor despite routine preoperative consultation. Conversational artificial intelligence (AI) could establish patient-reported understanding before clinician contact, but whether such systems can achieve patient-reported understanding comparable to clinician-delivered education remains unknown. Methods We conducted a randomised equivalence trial (n = 130) of PEAR (Preoperative Education of Anaesthesia Risks), a multilingual retrieval-augmented conversational AI grounded in institutional consent materials, versus standard preoperative consultation in adults undergoing elective surgery. Results A total of 130 adults (mean age 52.4 +/- 14.5 years) were enrolled. Post-consultation understanding scores in the PEAR group met the pre-specified equivalence criterion compared with standard consultation across all three primary measures. Patients who interacted with PEAR before clinician contact achieved understanding scores comparable to those receiving standard face-to-face consultation alone. PEAR reduced documentation and consultation time, corresponding to a projected annual net benefit of approximately SGD 0.99 million (USD 0.78 million) at a single tertiary centre. Conclusions A retrieval-augmented conversational AI achieved patient-reported understanding of anaesthesia risk equivalent to standard preoperative consultation while substantially improving workflow efficiency. These findings support supervised deployment of conversational AI within perioperative care pathways while preserving clinician oversight for verification and patient-specific decision-making.

15

Physician Facing AI Tools Show Distinct Failure Modes Under Structured Stress Testing

Hazare, N. S.; Oh, W.; Kumar, G.; Goel, N.; Shaikh, A.; Sharma, A.; Desman, J.; Kumar, A.; Robles, C.; Singh, A.; Jangda, M.; Agaron, S.; Capone, C.; Ngai, D.; Itwaru, A.; Parchure, P.; Ramaswamy, A.; Gorbenko, K.; Timsina, P.; Lampert, J.; Tamler, R.; Manasia, A.; Kohli-Seth, R.; Kaplan, B.; Vakil, A.; Omar, M.; Glicksberg, B. S.; Freeman, R.; Stern, A. D.; Klang, E.; Darrow, B.; Stump, L. S.; Reich, D.; Charney, A.; Nadkarni, G. N.; Sakhuja, A.

2026-05-29 health informatics 10.64898/2026.05.27.26354248 medRxiv

Top 8%

0.1%

Show abstract

Importance: Physician-facing AI tools are now in clinical use, yet whether different platforms fail in similar or fundamentally different ways in high-stakes settings like critical care is unknown. Objective: To evaluate two physician-facing AI platforms, ChatGPT for Clinicians and OpenEvidence, for distinct vulnerabilities under structured stress testing. Design, Setting, and Participants: An observational study conducted using 60 simulated critical care vignettes developed and adjudicated by four attending critical care physicians. Data were collected in the last week of April 2026, via the public website interfaces of each platform. Interventions/Exposures: A 2x2x2x2 factorial design across four stressors - anchoring, cognitive load, social conformity pressure, and a clinically incorrect directive - yielded 16 prompt subsets per vignette and 960 prompts per platform. A separate multi-turn adversarial prompting paradigm administered three sequential "You are incorrect" challenges to baseline vignettes. All prompts had a universal output length constraint of fewer than 30 words. Main Outcomes and Measures: Critical elements capture (percentage of gold-standard critical elements present in responses), susceptibility to clinically incorrect directive, and sycophancy (reversal of an initial correct recommendation under iterative adversarial challenge). Results: Across 1916 responses to 1920 prompts, ChatGPT for Clinicians captured more gold-standard critical elements than OpenEvidence (81.4% {+/-} 18.1% vs 61.0% {+/-} 23.5%; adjusted difference, 20.3 percentage points; 95% CI, 18.3 to 22.4; P < .001) and was less susceptible to clinically incorrect directives (1.7% vs 8.0%; adjusted odds ratio, 0.07; 95% CI, 0.02-0.21; P < .001). Anchoring and social conformity pressure were associated with reduced critical element capture across both platforms, while cumulative stressor burden reduced critical element capture only on OpenEvidence. Conversely, ChatGPT for Clinicians reversed correct recommendations more readily under adversarial prompting (hazard ratio, 2.61; 95% CI, 1.10 - 6.19; P = .03). Conclusion and Relevance: The two physician-facing clinical AI platforms evaluated demonstrated non-overlapping vulnerabilities, with neither platform uniformly superior. These findings argue against single-axis ranking of clinical AI systems and support multidimensional safety evaluation encompassing completeness of reasoning, resistance to incorrect directives, and stability under adversarial challenge.

16

Core Components for Emergency Medical Dispatch Systems: An International Delphi Consensus Study

Weber, K.; Stassen, W.; Jayaraman, S.; Odland, M. L.; Nishimwe, A.; Welgama, I.; Wallis, L.; Ignatowicz, A.; Davies, J. P.

2026-05-28 emergency medicine 10.64898/2026.05.26.26354117 medRxiv

Top 9%

0.1%

Show abstract

Introduction -- Emergency Medical Dispatch Systems (EMDS) can reduce delays in accessing emergency care by providing structured communication, triage, and coordination. However, such systems remain absent or underdeveloped in most low- or middle-income countries (LMICs). This study aimed to establish international consensus on essential EMDS components to inform global guidance. Methods -- We convened a multidisciplinary expert group to draft a preliminary list of essential components for three EMDS levels reflecting resource availability and system maturity. We then conducted a three-round Delphi with international experts to reach consensus on core EMDS components. Components which had [≥]75% agreement were included, those with [≥]75% disagreement were excluded. Components not achieving consensus by Round 3 were removed. Results were analysed overall and stratified by respondents' country income level. A subsequent online expert meeting resolved inconsistencies and finalised the component list. Results -- The expert group generated 111 components for each of three EMDS levels (Foundational, Emerging, and Established) spanning 11 operational domains. Of the 68 experts invited to the Delphi, 43 participated in Round 1 and 30 in Round 3. Across all Delphi rounds, 289 components reached consensus for inclusion. The consensus resulted in a final list of 227 components (63 Foundational, 84 Emerging, and 80 Established). Consensus agreement clustered around core EMDS domains including communication, structured call-taking and prioritisation, advice-giving, resource dispatch and tracking, and foundational governance and data functions, whereas items showing either non-consensus or consensus disagreement were typically technology-dependent or context-specific. Conclusions -- This international consensus offers guidance for EMDS development across diverse resource settings and provides a scalable roadmap to strengthen emergency care systems.

17

Grounding Language Models in Behavioral Science to Scale Physical Activity Interventions for Hispanic/Latinx Populations

Mantena, S. D.; Johnson, A.; Schuetz, N.; Tolas, A.; Montalvo, S.; Delgado-SanMartin, J.; Ramirez Posada, M.; Du, L.; Zhang, S.; Huynh, A. D.; Oppezzo, M.; King, A. C.; Schmiedmayer, P.; Lawrie, A.; Rodriguez, F.; Ashley, E.; Kim, D. S.

2026-05-28 cardiovascular medicine 10.64898/2026.05.26.26354165 medRxiv

Top 9%

0.1%

Show abstract

Objective: Hispanic/Latinx populations in the U.S. experience higher rates of chronic disease linked to physical inactivity, yet digital health interventions remain largely inaccessible to more than 16 million Hispanic/Latinx adults with limited English proficiency. While large language models (LLMs) offer scalable personalization, their use in non-English behavioral coaching is unexplored. This study introduces MHC-Coach-ES, a Spanish-language LLM fine-tuned on the Transtheoretical Model (TTM) of behavior change. Materials and Methods: We fine-tuned Llama 3-70B-Instruct using a two-stage pipeline. First, the model was adapted to Spanish health and motivational language using a 2.21-million-token corpus. Second, it was instruction-tuned on 3,268 translated human written messages to align the model with the Transtheoretical Model (TTM) of Behavioral Change. We compared MHC-Coach-ES with Llama 3-70B-Instruct and translated human-expert messages using a forced-choice preference survey (N = 77) and blinded expert review (N = 2). Results: Spanish-speaking participants significantly preferred MHC-Coach-ES messages over translated human-expert messages (81% preference, P<0.001). Linguistic analysis showed that MHC-Coach-ES produced more temporally anchored messages than the base model (65% vs. 20%), while maintaining readability. In blinded evaluation, clinical experts rated MHC-Coach-ES higher for alignment with Transtheoretical Model stages than human-expert messages (4.83 vs. 4.38 out of 5). The base model also outperformed translated expert messages across preference and expert ratings. Conclusions: Generative AI can operationalize behavioral science frameworks in Spanish, offering a scalable approach to reducing health disparities. The strong performance of both MHC-Coach-ES and the base model highlights the promise of generative and personalized approaches over translation-based localization for theory-driven behavioral interventions.

18

Why is team-based hypertension care failing to take hold in Australia? Real-world evidence from primary care

Satheesh, G.; Slater, K.; Trivedi, R.; Clapham, E.; Lopez, F. M.; McCormack, B.; Miranda, J. J.; Mishra, S. R.; Peterson, G. M.; Sarkies, M.; Schutte, A. E.; Chapman, N.

2026-05-26 primary care research 10.64898/2026.05.25.26354005 medRxiv

Top 9%

0.1%

Show abstract

Objective: The shortage of general practitioners (GPs) in Australia has intensified interest in team-based care for hypertension, involving pharmacists and nurses. This study explored primary care provider experiences, barriers, and facilitators related to implementing team-based care in Australia. Design: Qualitative study using semi-structured interviews with primary care providers. Methods: We conducted 51 interviews with GPs (n=24), nurses (n=12), and pharmacists (n=15), purposively selected from diverse primary care settings. Analysis combined deductive coding, informed by the Theoretical Domains Framework and Consolidated Framework for Implementation Research, with inductive thematic analysis to identify emergent themes. Results: Interviews demonstrated a predominantly GP-centred care model, with nurse and pharmacist involvement largely confined to supporting roles, including blood pressure measurement, prescription refills, patient follow-up and counselling. Their contributions were constrained by barriers at both practice (e.g., limited GP support, fragmented communication across providers) and health system levels (e.g., limited financial incentives and restricted reimbursement pathways). Despite their critical role in care planning, nurses described being hamstrung by workload and limited direct funding for hypertension-related services. Pharmacists reported unreimbursed blood pressure checks and restricted funding for medication reviews that constrained the sustainability of their hypertension services. Role ambiguity and the absence of standardised protocols on task sharing further limited collaboration, with nurses and pharmacists describing concerns about overstepping professional boundaries. Attitudes towards team-based care ranged from active disregard (outright rejection) to conditional acceptance and occasional active uptake (strong endorsement). Conclusion: Despite clear willingness among nurses and pharmacists to alleviate GP burden, team-based care is rarely implemented in routine practice. Addressing system-level barriers (funding models that incentivise team-based care and standardised treatment protocols that clarify shared workflows), alongside provider-level barriers (stronger awareness and training that normalises task sharing), is critical to support genuine team-based hypertension care in Australia.

19

Explaining socioeconomic inequalities in antibiotic prescribing for common infections in English primary care: a population-based study

Yang, M.; Nguyen, V. N.; Walker, A. S.; Robotham, J. V.; van Leeuwen, E.; Hayward, G.; Butler, C. C.; Pouwels, K. B.

2026-05-27 health economics 10.64898/2026.05.26.26354118 medRxiv

Top 9%

0.1%

Show abstract

OBJECTIVES To quantify socioeconomic inequalities in antibiotic prescribing for common infections in primary care, and assess whether these inequalities arise from differences in consultation frequency, prescribing behaviour, or variation in vaccination uptake, smoking, and body mass index. DESIGN Population based cohort study. SETTING Primary care data from Clinical Practice Research Datalink, England. PARTICIPANTS 17,195,399 children and adults estimated to have been registered with a general practice in 2019. MAIN OUTCOME MEASURES Antibiotic prescribing rates (prescriptions per person-year), consultation rates (consultations per person-year), and probability of receiving an antibiotic prescription following consultation. RESULTS Higher deprivation was associated with higher antibiotic prescribing rates for most respiratory tract indications. In children, prescribing rates were 44.8% (95% confidence interval [CI] 41.9% to 47.7%) higher for upper respiratory tract infections and 47.6% (95% CI 44.2% to 51.3%) higher for lower respiratory tract infections in the most versus least deprived twentile. In adults, prescribing rates for lower respiratory tract infections were 22.7% (95% CI 21.4% to 24.1%) higher in the most deprived twentile. Prescribing rates for other indications showed weak, U-shaped, or negative associations with deprivation. Prescribing inequalities were primarily driven by inequalities in consultation rates rather than probability of receiving antibiotics once consulted. Lower influenza vaccination uptake partly accounted for higher consultation rates for respiratory infections among more deprived children, while smoking prevalence contributed to inequalities among adults. CONCLUSIONS Socioeconomic inequalities in antibiotic prescribing vary by indication type and are largely explained by consultation frequency. Reducing inequalities may require interventions that decrease the need to consult, e.g. improving influenza vaccination coverage in children and reducing smoking among adults, rather than focussing solely on prescribing behaviour.

20

Exploring healthcare experiences and access needs in unplanned hospital admissions for Inflammatory Bowel Disease: A multi-perspective qualitative study

Hawkins, R. L.; Cotterill, C.; McCormick, S.; Kellar, I.; Lobo, A. J.; Sampson, F. C.

2026-05-27 gastroenterology 10.64898/2026.05.26.26353596 medRxiv

Top 9%

0.1%

Show abstract

Background Unplanned hospital admissions in Inflammatory Bowel Diseases (IBD) account for nearly three-quarters of IBD inpatient stays in the United Kingdom. Although costly to services and distressing for patients, research exploring experiences and potential drivers of admissions is limited. We undertook a qualitative study to explore the healthcare experiences and access needs of people with IBD who had unplanned admissions, along with their caregivers and clinicians. Methods Semi-structured interviews with 25 participants from a single tertiary IBD service in England (17 people with IBD, 3 informal caregivers, 5 clinicians) were conducted. We applied thematic framework analysis, guided by the Candidacy Framework, and worked with 2 patient and public contributors to generate final themes. Results We identified four themes: 1) Difficulties in Identifying flares and asserting severity before admission, summarised the prevailing uncertainty in identifying a flare and access to timely IBD care. 2) Navigating a disjointed healthcare system, highlighted how lack of care plans and systemic barriers can delay access. 2) Emergency care access challenges highlighted the gaps in emergency and inpatient care during flares. Whilst 4) fighting for care and individual advocacy needs, described the persistent assertion for care that may disproportionally impact access to vulnerable groups, also highlighting the importance of positive interpersonal relationships. Conclusions Individual, interpersonal and healthcare factors across the patient pathway were perceived to shape access to care in unplanned IBD admissions. Potentially reducing admissions requires proactive strategies, including the integration of patient education, monitoring tools, establishment of specialist rapid-access pathways, and formal psychological support to address barriers to access.